Multiple myeloma (MM) is the second most common hematologic malignancy in the United States, with approximately 32,000 new cases and 12,000 deaths annually (Siegel et al., 2021). Despite therapeutic advances, disparities in MM treatment persist, particularly among patients with lower income, inadequate insurance coverage, and those living far from treatment centers (Bodanapu et al., 2024). These socioeconomic and geographic factors are associated with reduced access to life-saving therapies and poorer survival outcomes (Derman et al., 2020; Costa et al., 2017), yet existing risk models primarily focus on clinical and demographic characteristics (Greipp et al., 2005; Palumbo et al. 2015).

To address this gap, we developed a machine learning (ML) model incorporating detailed demographic, clinical, socioeconomic, and geographic variables to predict suboptimal treatment in patients with newly diagnosed MM. We leveraged data from the nationwide Veterans Affairs (VA) healthcare system and a previously validated approach to extract patient-level data from 4,887 incident MM cases diagnosed between 2012 and 2022 (La et al., 2023). Predictors included age group, gender, race, ethnicity, smoking status, International Staging System stage, Charlson Comorbidity Index (CCI), income, distance from VA hospital, and the Area Deprivation Index (ADI) as a scientifically validated measure of the adverse social exposome (i.e., neighborhood disadvantage) that can be used to evaluate and improve factors that impact health across populations (Kind & Buckingham, 2018). Our models were trained to predict the likelihood of suboptimal treatment, defined as failure to receive a frontline triplet or quadruplet therapy regimen and/or not undergoing autologous stem cell transplant if eligible based on age, comorbidity, and insurance criteria. During pre-processing, data were split into training/tuning (80%) and testing (20%) sets. Imputations were conducted for random and non-random missingness patterns using multivariate imputation by chained equations.

We compared several ML algorithms, including logistic regression, random forest, and extreme gradient boosting (XGBoost) in R v4.4.1 with the caret package (R Core Team, 2024). Hyperparameters for each model were tuned during 5-fold cross validation to optimize the area under the receiver operating characteristic curve (AUC). Bootstrap resampling with 200 replicates was used to estimate uncertainty in test set performance. Performance metrics assessed included AUC (mean and standard deviation), calibration plots, sensitivity, specificity, and F1 score.

All models yielded similar discrimination, achieving mean AUC of 0.66-0.69 and 0.66-0.67 on the training and testing sets, respectively. However, the XGBoost model was better calibrated per decile plot. Thus, we report XGBoost results here. At the optimal threshold of 0.57, the XGBoost model displayed a mean AUC of 0.66 (SD = 0.02) accuracy of 0.73 (SD = 0.02), specificity of 0.74 (SD = 0.06), sensitivity of 0.12 (SD = 0.02), and F1 score of 0.20 (SD = 0.03). Among XGBoost predictors, older age (65+) and moderate to severe CCI had the most gain-based impact (>10 units of loss reduction). ADI (third quintile) and income ($25k-50k, 50k-75k, and 75k-100k) variable categories were ranked as 6th and 8th through 10th in the same gain-based feature analysis.

Our findings demonstrate that integrating social determinants and geographic access into ML models aids in accurately identifying MM patients at risk of suboptimal therapy in a large real-world healthcare system. This predictive approach enables the potential development of targeted, individualized interventions to reduce treatment disparities and improve patient outcomes. Our results further suggest that socioeconomic and geographic factors may influence treatment patterns to an equal or greater extent than traditional clinical variables. External validation using data from urban academic and community health centers is planned, alongside prospective studies to evaluate deployment of tailored patient navigation resources guided by this model.

This content is only available as a PDF.
Sign in via your Institution